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Abstract 


^This  progress  report  describes  work  completed  in  the  first  two  years  of  a  proposed 
three-year  study  of  the  discrimination  and  identification  of  complex  sounds.  Support 
was  terminated  at  the  end  of  the  second  year. ) Projects  number  2  and  4  in  the 
"Research  Accomplished"  section  were  initiated  with  NMRDC  funds.  The  remaining 
projects  are  part  of  ongoing  investigations  with  pariial  funding  from  AFOSR  and  NTH. 
t>Much  of  the  effort  in  the  first  year  was  devoted  to  the  installation  of  a  multi-user  PDP 
11  computer  system  to  be  used  for  program  development,  data  analysis,  and  synthesis 
of  complex  sounds.-' This  installation  involved  the  integration  of  the  PDP  11  with  exist¬ 
ing  computer  facilities  as  well  as  with  newly  acquired  Apollo  workstations.  Consider¬ 
able  effort  was  also  devoted  to  the  development  of  programs  for  experimental  control 
and  data  analysis'iHn  addition,  experiments  were  completed  in  four  topic  areas.'  (1) 
■_-W°rk  on  auditory  processing  capacity  limitations  was  extended  using  threshold  values 
of  $f/f  and'^I  (dB)  as  dependent  measures^^^New  experiments  revealed  that  the  propor¬ 
tion  of.  the  total  pattern  duration  occupied  b^,  target  component  accounts  for  large- 
scale  changes  in  frequency  discrimination  performance  previously  attributed  to  varia¬ 
tion  in  the  number  of  components  within  a  pattern.  y{2)  New  experiments  on  the 
discriminability  of  noise  samples  have  shown  that  the  discriminability  of  differences  in 
pairs  of  noise  samples  depends  on  the  duration  and  location  of  the  deviant  portion  of 
noise  within  a  noise  sample,  replicating  a  similar  finding  obtained  earlier  with  tonal 
patternsj  (3)  Internal  noise  is  assumed  by  signal  detection  theories  to  account  for  less- 
than-perfect  detection  and  discrimination  performance.1'  A  model  has  been  developed  in 
which  the  internal  noise  has  been  partitioned  into  peripheral  "and)  central  components. 
That  model  was  tqsted  in' experiments  in  which  the  external  stimulus  distributions  were 
rigidly  controlled.  |^4)  Preliminary  studies  with  complex  stimuli  developed  for  use  in 
vigilance  experiments  have  shown  that  listeners  are  able  to  integrate  information  across 
multiple  components  of  multidimensional  sounds,  with  little  or  no  loss  due  to  increasing 
the  number  of  components  over  a  range  of  from  1  to  7.  NThese  experiments  have  also 
shown  large  individual  differences  in  subjects’  tendencies  to  base  their  decisions  on 
specific  dimensions  of  complex  sounds.  Initial  attempts  to  train  listeners  to  attend  to 
dimensions  that  were  initially  unattended  (by  giving  listeners  extended  practice  under 
conditions  in  which  only  the  to-be-learned  dimension  could  vary  over  trials)  resulted  in 
little  or  no  improvement  for  most  listeners.  These  experiments  have  generated  several 
forms  of  useful  information  for  operational  applications.  They  provide  a  large  amount 
of  data  on  the  level  and  range  of  performance  that  can  be  expected  of  highly  trained 
human  listeners  whose  assignments  require  them  to  detect,  discriminate,  or  identify 
complex  sounds.  They  provide  "benchmarks"  for  the  selection  of  unusually  salient  or 
identifiable  acoustic  signals.  Last,  they  provide  theoretical  models  of  auditory  discrimi¬ 
nation  and  decision  making  which  are  not  limited  to  the  classes  of  acoustic  events  used 
in  the  laboratory  experiments  conducted  to  develop  and  test  these  models. 


Specific  Aims 

Four  experimental  projects  were  proposed,  each  dealing  with  a  different  aspect  of 
listeners’  abilities  to  detect  or  identify  complex  sounds.  The  purpose  of  this  work  was 
twofold:  (1)  to  ascertain  those  properties  of  acoustical  waveforms  which  determine  per¬ 
formance  limits  for  human  listeners  and  (2)  to  determine  optimal  training  procedures 
for  auditory  "monitoring"  or  watchstanding  tasks.  The  general  approach  is  that  of  con¬ 
temporary  psychoacoustic  research,  oriented  by  the  Theory  of  Signal  Detectability 
(Robinson  and  Watson,  1972). 

1.  Pattern  discrimination  and  identification.  These  studies  continued  a  series  of 
experiments  on  the  total  amount  of  information  listeners  are  able  to  extract  from  com¬ 
plex  sounds.  Experiments  extended  previous  work  in  order  to  quantify  the  informa¬ 
tional  limits,  or  "channel  capacity"  of  the  human  listener,  toward  the  development  of  a 
general  model  of  complex  auditory  processing.  Such  a  model  could  be  useful  both  for 
the  selection  of  optimal  preprocessing  of  SONAR  signals,  and  to  predict  the  limits  of 
listeners’  performance  from  the  acoustic  properties  of  a  signal  catalog. 

2.  Detection  and  Discrimination  of  Noise  Signals.  Listeners’  abilities  to  detect, 
discriminate  between,  and  identify  Gaussian  or  pseudo- Gaussian  noise  samples  were 
examined  using  procedures  similar  to  those  used  with  multi-tone  patterns.  The  experi¬ 
ments  were  designed  to  test  a  preliminary  model  based  on  an  assumed  bank  of  critical- 
bandwidth  filters,  wherein  the  output  of  each  filter  is  given  specific  weightings  by  indi¬ 
vidual  listeners.  It  is  also  assumed  that  various  temporal  portions  of  the  waveform  are 
similarly  weighted.  Models  based  on  responses  to  noise  samples  are  more  likely  to 
characterize  the  actual  transfer  characteristics  of  the  human  auditory  system  than  are 
those  based  on  more  constrained  stimuli. 

3.  Detection  and  Identification  without  Defined  Observation  Intervals.  The  free- 
response  testing  procedure  of  Watson  and  Nichols  (1976)  was  to  be  used  to  study  the 
detection  and  identification  of  noise  signals  presented  at  random  times.  Preliminary 
testing  with  complex  sounds  was  completed,  but  funding  was  terminated  before  these 
experiments  could  be  carried  out. 

4.  Optimization  of  Training  Procedures.  Training  procedures  that  have  led  to  per¬ 
ceptual  enhancement  of  specific  spectral-temporal  regions  of  multi-tone  patterns  (i.e., 
reducing  trial-to-trial  stimulus  variation  or  allowing  the  listener  to  directly  control  the 
acoustic  properties  of  the  signal  in  an  early  phase  of  auditory  training)  were  to  be 
applied  in  an  effort  to  minimize  the  time  required  to  learn  to  detect  and  identify  specific 
classes  of  noise  signals  and  other  complex  sounds.  Only  preliminary  training  studies 
with  complex  sounds  were  completed  before  funding  was  terminated.  These  early 
results  demonstrated  changes  in  sensitivity  that  often  ranged  from  near-chance  perfor¬ 
mance  in  early  testing  to  near-perfect  performance  following  training. 


Scientific  Significance 

This  research  has  provided  needed  information  about  the  limits  of  hearing  for  com¬ 
plex  sounds.  Most  of  what  is  known  about  normal  hearing  has  been  learned  through 
experiments  with  simple  stimuli:  pure  tones,  noise  bursts,  or  clicks.  Our  practical  con¬ 
cerns  with  human  hearing,  however,  are  based  in  the  difficulty  listeners  exhibit  in 
extracting  information  from  complex  sounds.  Knowledge  of  the  ability  to  detect  infor¬ 
mation  in  complex  sounds  may  reduce  some  of  the  mystery  that  surrounds  the  human 
listener’s  ability  to  process  special  classes  of  sounds  (e.g.,  speech,  music,  SONAR). 

These  experiments  are  part  of  an  ongoing  investigation  of  the  perception  of  com¬ 
plex  sounds  (see  the  original  proposal  for  a  brief  review)  utilizing  various  types  of 
stimuli,  such  as  rapid,  multi-tone  patterns,  bursts  of  reproducible  Gaussian  noise,  and 
other  synthetic  multidimensional  sounds,  including  speech.  These  experiments  have 
allowed  us  to  identify  some  of  the  critical  stimulus  parameters  that  limit  listeners’  abili¬ 
ties  to  detect  various  aspects  of  complex  sounds,  such  as  the  degree  of  uncertainty 
involved  in  a  listening  condition  and  the  spectral-temporal  location  of  information  to  be 
detected  within  a  complex  sound.  New  experiments  are  helping  us  to  more  precisely 
define  the  stimulus  parameters  that  influence  the  perception  of  complex  sounds.  In 
addition  we  have  begun  to  study  listeners’  ability  to  integrate  information  in  sequences 
of  multidimensional  sounds  and  have  documented  certain  striking  individual  differences 
in  the  ability  to  base  judgments  on  specific  dimensions  of  multidimensional  stimuli. 

This  information  has  contributed  to  the  development  of  a  general  model  of  auditory 
pattern  recognition  and  discrimination  that  will  have  relevance  beyond  the  domain  of 
simple,  isolated  sounds. 
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Research  Accomplishments 


1.  Auditory  processing  capacity  for  tonal  sequences 

A  series  of  experiments  on  the  informational  capacity  of  the  auditory  system  for 
temporal  patterns  has  now  been  completed.  These  experiments  were  originally  con¬ 
ceived  as  a  means  of  determining  the  optimal  combination  (for  information  transmis¬ 
sion)  of  total-pattern  and  pattern-component  durations.  The  total  information  in  a 
pattern  is  considered  to  be  proportional  to  the  number  of  independently  varying  com¬ 
ponents.  The  patterns  in  each  of  these  studies  consisted  of  a  series  of  75-dB  tone 
pulses,  whose  frequencies  were  randomly  selected  from  the  range  300-3000Hz.  Successive 
tones  in  the  sequences  were  never  closer  than  1/3  octave,  and  were  gated  on  and  off 
with  a  2.5  msec  rise-decay. 

An  earlier  series  of  studies  conducted  in  our  laboratory  used  an  adaptive-tracking 
procedure  in  which  the  number  (n)  of  components  in  fixed-duration  tonal  patterns  was 
increased  or  decreased  from  trial-to-trial,  in  a  S/2AFC  discrimination  task.  (S/2AFC: 

A  paradigm  in  which  a  standard  pattern  is  followed  by  two  test  patterns,  one  of  which 
is  different  from  the  standard.)  Patterns  with  six  total  durations,  from  62.5  to  2000 
msec,  were  presented  in  random  order,  while  an  independent  adaptive-tracking  history 
for  each  duration  converged  on  the  n’s  (number  of  components)  required  for  71% 
discrimination.  As  the  numbers  of  components  in  the  patterns  were  varied,  the  dura¬ 
tion  of  the  individual  components  was  always  the  total  duration/n.  This  procedure 
was  repeated  in  seven  separate  experiments. 

Results  of  these  experiments  suggest  that  when  tonal  patterns  can  be  discriminated 
by  the  presence  of  a  silent  gap,  or  by  a  change  in  gap  position,  performance  is  deter¬ 
mined  by  a  critical  component  duration  (25-50  msec,  depending  on  the  specific  task). 

In  those  cases,  the  threshold  values  of  n,  for  each  total  pattern  duration,  are  approxi¬ 
mately  the  total  durations  divided  by  a  constant.  In  experiments  in  which  discrimina¬ 
tion  requires  some  degree  of  resolution  of  the  actual  pitch  contour,  performance  seems 
to  reflect  a  fixed  informational  capacity  for  pattern  discrimination,  in  the  range  of  6-9 
components  per  pattern.  No  clear  optimal  combination  of  total  and  component  dura¬ 
tion  can  be  seen  in  these  data,  since  the  same  6-9  component  limit  is  found  for  a  32-fold 
range  of  total  durations  (62.5-2000  msec). 

1.1.  Isochronous  vs.  anisochronous  patterns  (Watson,  Foyle) 

The  results  of  the  above  experiments  were  obtained  with  isochronous  patterns 
(duration  of  each  component  =  total  duration/n).  The  relative  constancy  of  the  total 
information  in  discriminable  patterns  ranging  from  62.5  msec  to  2000  msec  might  be  a 
property  only  of  patterns  which  have  the  very  salient  rhythmic  quality  associated  with 
isochronous  temporal  structure.  A  major  difference  in  discriminability  for  isochronous 
and  anisochronous  patterns  might  be  predicted  by  the  results  of  a  recent  experiment 
reported  by  Sorkin  (  J.  Acoust.  Soc.  Am.  75,  S21,  1984).  To  investigate  that  possibil¬ 
ity,  a  new  experiment  was  conducted,  in  which  the  random  sequences  of  patterns 
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included  three  levels  of  temporal  "jitter"  of  the  non-target  component  durations.  In  the 
resulting  anisochronous  patterns  the  target-component  durations  (the  component  whose 
frequency  was  incremented)  were  still  total  duration/n,  while  the  non-target  com¬ 
ponents  were  each  randomly  increased  or  decreased  in  duration  by  a  fixed  percentage  of 
their  isochronous  value.  The  "jitter"  percentages  were  0%,  30%,  or  50%,  in  separate 
conditions.  It  was  found  that  threshold  values  of  n  were  unaffected  by  the  two  levels  of 
anisochrony,  although  the  perceptual  quality  of  the  patterns  was  markedly  changed  by 
these  manipulations.  Sorkin’s  study  differed  from  this  experiment,  in  that  he  studied 
the  effects  of  within-trial  variation  in  the  temporal  structure  of  patterns,  thus  these 
results  do  not  directly  contradict  his. 

1.2.  Capacity  estimated  in  a  true  frequency-discrimination  paradigm  (Watson, 
Kidd) 

In  the  above  experiments,  the  dependent  variable  was  n  (the  number  of  com¬ 
ponents  in  a  constant- duration  tonal  pattern  —  an  unusual  psychophysical  procedure 
that  is,  to  our  knowledge,  unique  to  these  experiments.  While  we  know  of  no  theoreti¬ 
cal  reason  that  this  paradigm  would  yield  aberrant  results  compared  to  more  tradi¬ 
tional  methods,  it  nevertheless  seemed  reasonable  to  attempt  to  estimate  the  pattern- 
discrimination  "capacity"  using  a  more  traditional  psychophysical  approach.  Another 
experiment  was  therefore  conducted,  in  which  the  dependent  variable  in  the  adaptive- 
tracking  variable  was  Af/f,  the  proportional  change  in  the  frequency  of  a  mid-temporal 
position,  mid-frequency  component.  Threshold  values  (71%  correct)  of  Af/f  were  deter¬ 
mined  for  various  numbers  of  components,  for  total  pattern  durations  of  125,  500,  and 
1500  msec.  Although  there  is  some  reduction  in  threshold  as  total  pattern  duration 
(and  therefore  component  duration,  in  these  isochronous  patterns)  is  increased,  that 
effect  is  extremely  small  compared  to  the  changes  associated  with  variation  in  n. 

Taken  together,  the  results  of  these  experiments  suggest  a  limit  on  pattern  process¬ 
ing  in  terms  of  the  total  amount  of  information  contained  in  tonal  patterns,  rather 
than  in  terms  of  critical  values  of  some  physical  parameters.  Such  a  processing  limit  is 
reminiscent  of  Miller’s  (1956)  "magical  number  7 ±2,"  and  of  the  results  of  some  of 
Pollack’s  (1953)  experiments  on  the  information  in  multi-dimensional  auditory  displays. 
It  extends  that  earlier  work  to  complex  temporal  auditory  stimuli.  These  limits  appear 
to  be  general  at  least  for  stimuli  in  the  range  of  pattern  durations  thus  far  investigated 
(62.5-2000  msec),  but  only  for  cases  in  which  discrimination  must  be  based  on  the  con¬ 
tents  of  immediate  memory.  When  the  listener  has  some  long-term  basis  for  focusing 
attention  on  a  restricted  portion  of  a  complex  pattern,  then  these  informational  limits 
do  not  yield  accurate  predictions  of  performance.  When  the  information-processing 
demands  are  reduced,  as  by  permitting  successful  use  of  top-down  direction  of  attention 
(e.g.,  Spiegel  and  Watson,  1981)  considerably  greater  amounts  of  stimulus  information 
may  be  included  in  discriminable  patterns.  The  predictability  of  the  waveforms  of 
speech  (or  of  most  music)  thus  affects  the  applicability  of  the  limited-capacity 
hypothesis  to  such  familiar  and  highly  constrained  stimuli. 
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1.3.  Detection  of  level  changes  in  multi-tone  patterns  (Watson,  Kidd,  Washburne) 

The  experiments  on  listeners’  processing  capacity  have  now  been  extended  to  the 
detection  of  changes  in  the  level  of  individual  tones  in  multi-tone  patterns.  Listeners’ 
abilities  to  detect  increments  and  decrements  of  the  intensity  of  tones  were  examined 
with  a  range  of  sequence  lengths  (1  to  9)  and  of  total  pattern  durations  (125  to  1500 
msec).  We  found  that,  in  contrast  to  our  data  for  the  detection  of  changes  in  fre¬ 
quency,  performance  is  primarily  affected  by  individual  component  duration  with  very 
little  influence  of  number  of  tones  or  of  total  pattern  duration.  This  is  very  much  like 
the  results  of  our  earlier  experiments  on  the  detection  of  gaps  in  multi-tone  patterns. 
These  cases  have  in  common  that  it  is  not  necessary  to  attend  to  the  series  of  pitch 
changes  in  order  to  detect  the  change  in  the  pattern.  As  a  working  hypothesis,  it 
appears  that  "saturation"  of  the  processing  capacity  for  pitch  changes  has  little  or  no 
degrading  effect  on  listeners’  abilities  to  detect  changes  in  other  stimulus  dimensions. 
This  result  is  consistent  with  Pollack’s  findings  of  an  increase  in  information  transmit¬ 
ted  through  the  use  of  multi-dimensional  encoding. 

1.4.  Proportional  target-tone  duration  as  a  factor  in  the  discriminability  of  tonal 
patterns  (Watson,  Kidd) 

A  series  of  experiments  on  listeners’  abilities  to  extract  information  from  patterns 
with  varying  total  durations  and  numbers  of  tonal  components  has  previously  been 
reported  [J.  Acoust.  Soc.  Am.  Suppl.  1  73,  S44  (1983);  77,  Si  (1985)].  In  those  experi¬ 
ments  listeners  were  tested  in  high-stimulus-uncertainty,  same-different  pattern  discrim¬ 
ination  tasks,  in  which  the  tonal  patterns  to  be  discriminated  differed  by  changes  in  the 
frequency  of  one  or  more  components.  Discrimination  performance  in  those  tasks  was 
consistent  with  previous  measures  of  the  frequency  resolving  power  of  the  auditory  sys¬ 
tem  when  the  patterns  contained  one  to  three  equal- duration  components,  for  total  pat¬ 
tern  durations  from  62.5-2000  ms.  As  the  number  of  components  was  raised,  discrimi¬ 
nation  thresholds  increased  by  large  amounts,  often  by  factors  of  10-100  for  patterns 
with  more  than  seven-eight  components.  While  this  result  might  imply  an  informa¬ 
tional  limit  on  pattern  processing,  it  is  also  consistent  with  the  hypothesis  that  target 
tones  are  equally  well  resolved  if  they  occupy  equal  proportional  durations  of  the  patterns 
in  which  they  occur.  Results  of  a  new  experiment,  in  which  the  proportional  durations 
of  target  tones  and  the  number  of  tones  per  pattern  were  independently  varied,  suggest 
that  proportional  duration  of  the  target  tones  is  in  fact  the  primary  determinant  of 
pattern  discriminability  for  tonal  patterns  ranging  from  100-1500  ms  in  total  duration. 
[Abstract  of  paper  presented  at  the  114th  meeting  of  the  Acoustical  Society  of  America; 
Miami,  Florida;  November,  1987.] 
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2.  Detection  of  pattern  repetition  in  continuous  tone-patterns  (Kidd,  Watson, 
Washburne) 

The  existence  of  a  general  processing-capacity  limitation,  as  suggested  in  the  previ¬ 
ous  studies,  does  not  mean  that  all  pattern  discrimination  tasks  would  necessarily 
reflect  that  same  limit.  We  have  therefore  investigated  listeners’  abilities  to  detect  the 
repetition  of  multi-tone  patterns  as  a  function  of  tone  duration  and  number  of  tones  in 


the  pattern.  In  this  experiment,  generally  modeled  after  that  reported  by  Guttman  and 
Julesz,  1963),  subjects  are  presented  with  repeating  or  non-repeating  tonal  patterns 
using  a  tracking  paradigm  that  increases  or  decreases  the  number  of  tones  in  a  pattern, 
depending  on  a  subject’s  performance.  Because  of  the  possibility  that  successful  perfor¬ 
mance  of  the  task  might  be  strongly  influenced  by  detection  of  the  repetition  of  percep¬ 
tually  unique  events  within  a  pattern,  we  chose  patterns  designed  to  have  few  such 
events.  In  one  series  of  tests  we  investigated  the  effects  of  decreasing  the  bandwidth, 
intended  to  reduce  the  likelihood  of  the  occurrence  of  unique  events  that  result  from 
frequency- based  auditory  stream  segregation.  Preliminary  data  collection  has  been 
completed,  utilizing  50-msec  and  200-msec  tones  with  1/3-octave  and  1-octave  pattern 
bandwidths  (centered  on  1000  Hz)  with  9  subjects  participating  in  all  conditions.  These 
data  showed  strong  effects  of  tone  duration  and  bandwidth,  as  well  as  a  significant 
interaction  (due  to  a  slightly  greater  effect  of  bandwidth  at  the  50  ms  tone  duration). 
The  number  of  components  for  which  each  subject  could  correctly  detect  repetitions 
70%  of  the  time  was  estimated  for  each  condition.  The  mean  number  of  components 
for  the  9  subjects  for  each  condition  is  shown  in  Table  1.  In  general  it  can  be  seen  that 
listeners  are  able  to  detect  the  repetition  of  patterns  consisting  of  more  tones  with  the 
shorter  tone  duration  and  the  wider  bandwidth.  Interestingly,  the  effect  of  tone  dura¬ 
tion  is  not  simply  an  effect  of  total  pattern  duration:  subjects  are  able  to  detect  the 
repetition  of  patterns  with  longer  total  durations  (but  fewer  tones)  at  the  200-msec  tone 
duration. 

Table  1.  Mean  number  of  tones  for  70%  correct  detection  of  repetition 
(total  duration  of  detectable  repeating  patterns,  in  seconds,  shown  in 
parentheses). 


Bandwidth 


Tone  Duration 
50ms  200ms 


1/3  Octave  62.9  (3.16)  30.7  (6.14) 
1  Octave  94.1  (4.71)  35.5  (7.10) 


Despite  our  attempts  to  minimize  the  occurrence  of  unique  events,  subjects'  reports 
indicated  that  judgments  were  often  based  on  the  reoccurrence  of  particular  events 
rather  than  detection  of  whole-pattern  repetition.  To  further  reduce  the  occurrence  of 
unique  events,  a  new  version  of  this  experiment  was  developed  in  which  the  sequences  of 
pitches  of  consecutive  tones  approximated  a  sinusoidal  series.  Tones  deviated  randomly 
from  strict  sinusoidal  variation  by  ±  6%  and  a  single  repeating  pattern  spanned  three 
cycles.  This  procedure  reduces  the  possibility  of  unique  events  by  constraining  adjacent 
tone  relations  while  eliminating  the  problems  of  pattern-restart  discontinuities  and 
gross  changes  in  pattern  macrostructure. 

Initial  data  collection  with  this  new  procedure  revealed  that  unique  events  were 
still  being  used  as  a  basis  for  repetition  judgments.  We  are  currently  testing  a  new 
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procedure  that  tracks  on  the  deviation  around  the  sine  wave  with  a  variable  number  of 
tones  per  cycle.  The  goal  of  this  series  of  experiments  is  to  devise  a  class  of  tonal 
sequences  for  which  listeners  must  attend  to  the  microstructure  of  an  entire  sequence  to 
detect  the  repetition  of  a  series  of  tones  within  the  a  sequence. 


3.  Perception  of  salient  auditory  events  or  figures  (Kidd,  Watson,  Washburne) 

In  several  studies  Bregman  and  his  colleagues  (reviewed  in  Bregman,  1978)  have 
described  the  factors  associated  with  the  emergence  of  auditory  "streams"  (sets  of  ele¬ 
ments  within  a  sequence  of  sounds  that  are  more  salient  than  their  context).  Similar 
effects  have  been  noted  in  listening  to  repetition  of  the  multi-tone  sequences  used  in  our 
experiments.  A  new  series  of  experiments  has  been  designed  to  more  objectively  meas¬ 
ure  the  subpatterns  (or  auditory  "events"  or  "figures")  that  listeners  report  hearing 
when  patterns  are  repeated. 

In  an  auditory  figure-identification  procedure  (AFI),  listeners  work  at  computer 
terminals,  where  they  are  given  one-key  control  over  the  presence  or  absence  of  each  of 
the  components  of  a  tonal  pattern  (generally  10  tones).  They  check  each  tonal  com¬ 
ponent  by  turning  it  on  and  off,  to  determine  whether  that  component  is  a  part  of  an 
auditory  "figure"  that  emerges  after  the  pattern  has  been  repeated  several  times.  When 
a  component  is  identified  as  part  of  a  figure,  it  is  marked  (by  depressing  another  key), 
and  when  all  components  have  been  checked  the  listener  can  confirm  his  choices  by  a 
single  key  which  turns  on  and  off  all  non-marked  components  (ie.  the  "ground"). 
Another  keystroke  causes  the  selected  subpattern  and  the  time  required  to  identify  it  to 
be  recorded. 

Results  of  a  first  experiment  using  fdie  AFI  procedure  show  excellent  agreement 
among  the  figures  identified  by  five  well-trained  listeners  within  a  set  of  120  patterns. 

In  general,  listeners  identified  figures  within  one  frequency  range  (either  high  or  low) 
more  reliably  as  the  range  of  figural  components  is  relatively  more  compact  and  more 
distant  from  the  non-figural  components.  The  absolute  frequency  range  of  the  elements 
that  form  a  figure  was  not  significantly  related  to  its  salience. 

In  a  second  experiment,  the  accuracy  with  which  the  figural  and  non-figural 
(ground)  components  are  resolved  was  measured,  using  the  method  of  adjustment 
described  by  Watson  (1976).  The  frequency  of  single  components  was  adjusted  in  a 
comparison  pattern,  until  the  listener  decided  that  it  had  the  same  pitch  as  the 
corresponding  component  in  a  standard  pattern. 

In  general,  the  adjustments  of  figural  components  are  either  slightly  more  accurate 
than  for  those  that  form  the  ground,  or,  in  some  cases,  are  made  with  the  same  accu¬ 
racy,  but  require  more  pattern  repetitions  before  the  listener  is  satisfied  with  the  match. 

The  primary  goal  of  these  preliminary  experiments  was  to  devise  a  rapid  and  reli¬ 
able  means  by  which  listeners  can  identify  the  elements  of  a  pattern  which  they  per¬ 
ceive  as  a  discrete  auditory  "figure"  or  "event".  The  AFI  method  is  a  very  convenient 
means  of  a  achieving  that  goal.  In  future  experiments,  we  plan  to  use  that  method  to 
study  other  factors  that  may  be  systematically  related  to  the  emergence  of  auditory 
figures  or  "targets"  from  various  backgrounds. 
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4.  Perception  of  multidimensional  complex  sounds  (Watson,  Kidd,  Washburne) 


4.1.  Information  integration  with  multidimensional  complex  sounds 

Listeners’  abilities  to  perceive  information  independently  encoded  in  different 
dimensions  of  complex  sounds  were  examined  in  experiments  that  required  simultaneous 
attention  to  three  dimensions.  Stimuli  consisted  of  sequences  of  1,  3,  5,  or  7  brief 
pulses  that  were  generated  by  adding  five  100-msec  sinusoidal  components.  Each  pulse 
had  one  of  two  values  on  each  of  the  following  complex  dimensions:  1)  harmonicity 
(harmonic  vs  inharmonic  relations  among  the  components),  2)  spectral  shape  (linearly 
decreasing  amplitude  vs  a  two-peaked  amplitude  profile),  and  3)  amplitude  envelope 
(slow  vs  rapid  rise  and  de^ay  times).  Stimuli  were  selected  such  that  the  two  values  on 
each  dimension  were  highly  discriminable. 

Two  types  of  stimuli  were  generated  by  designating  one  value  on  each  of  the 
dimensions  as  the  "target"  value  (harmonic  spacing  of  sinusoids,  the  double-peaked 
power  spectrum,  and  rapid  rise/decay),  and  the  other  as  the  "non-target"  value.  The 
selection  of  dimensions  for  each  component  of  a  sequence  was  probabalistically  deter¬ 
mined  and  was  adjusted  to  yield  maximum  possible  (ideal)  performance  of  90%  correct 
for  all  sequences. 
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Two  groups  of  listeners  were  tested  for  10  days.  One  group  had  four  days  of 
training  in  a  single-dimension  control  experiment  in  which  they  identified  target  and 
non-target  values  for  each  of  the  individual  dimensions  while  the  other  two  dimensions 
varied  randomly.  The  other  group  had  no  prior  training  but  was  tested  in  the  single¬ 
dimension  control  experiment  after  completion  of  the  main  experiment. 

Performance  in  the  training  experiment  for  the  first  group  revealed  very  substan¬ 
tial  differences  among  listeners’  abilities  to  attend  to  the  individual  dimensions,  even 
after  4  days  of  training  (240  trials  per  day  per  dimension).  All  listeners  were  able  to 
correctly  detect  differences  at  least  70%  of  the  time  for  each  dimension,  and  close  to 
100%  for  at  least  one  dimension. 

Both  groups  of  listeners  showed  an  impressive  ability  to  integrate  information  over 
pulses  within  sequences  of  up  to  7  pulses,  with  identification  performance  at  about  5% 
to  10%  below  that  of  the  ideal  (90%).  The  similarity  in  performance  of  the  two 
groups,  even  at  early  stages  of  the  experiment,  indicates  that  the  type  of  training  we 
have  used  did  not  have  a  significant  effect  on  listeners’  ability  to  perform  the 
information-integration  task. 

Performance  of  the  second  group  of  listeners  in  the  single-dimension  control  experi¬ 
ment  after  10  days  of  listening  to  the  stimuli  in  the  multi-pulse  task  was  quite  similar 
to  that  of  the  group  tested  prior  to  the  multi-pulse  task.  There  appears  to  be  little  or 
no  effect  of  exposure  in  ihe  integration  task  on  discrimination  ability  as  tested  in  the 
single-dimension  control  experiment. 

In  order  to  better  understand  listeners’  attentional  strategies  in  the  integration 
task  and  how  they  might  differ  from  those  in  the  control  experiment,  response?  to  vari¬ 
ous  stimulus  configurations  (i.e.,  multi-component  stimuli  with  different  numbers  of 
target  values  on  each  dimension)  were  examined.  The  result  of  this  analysis  can  be 
summarized  as  follows: 


1.  Although  performance  levels  are  often  similar  to  levels  that  could  be  achieved  by 
attending  to  a  single  dimension  (approximately  80%),  listeners  were  clearly  attending  to 
more  than  one  dimension.  Correlations  between  the  number  of  stimuli  with  target 
values  on  a  given  dimension  and  listeners’  responses  were  computed  for  each  dimension 
and  combination  of  dimensions.  Correlations  between  responses  and  values  on  each 
individual  dimension  were  higher  than  would  be  predicted  on  the  basis  of  the  correla¬ 
tion  among  the  components,  and  correlations  with  the  sum  of  all  three  dimensions  were 
generally  higher  than  with  any  single  dimension  or  pair  of  dimensions.  In  other  words, 
the  listeners’  decisions  were  in  fact  multi  dimensionally  based. 

2.  Substantial  individual  differences  were  observed  in  the  extent  to  which  listeners 
attended  to  each  of  the  dimensions.  However,  the  allocation  of  attention  suggested  by 
the  results  of  the  single-dimension  control  experiment  did  not  always  agree  with  the 
apparent  attentional  distribution  observed  in  the  integration  experiment.  In  some 
cases,  the  dimensions  that  influence  listeners’  responses  most  were  not  those  that 
yielded  highest  performance  in  the  control  experiments  (when  feedback  was  based  on  a 
single  dimension).  It  thus  appears  that  a  listener’s  ability  to  attend  to  a  given  dimen¬ 
sion  while  others  vary  randomly  is  not  a  good  predictor  of  his  ability  to  use  that  same 
information  when  making  decisions  based  on  the  combination  of  multiple  dimensions. 

4.2.  Individual  differences  in  the  allocation  of  attention  to  specific  dimensions 

The  existence  of  large  individual  differences  in  the  allocation  of  attentional 
resources  to  various  dimensions  of  sound  sequences  has  recently  been  confirmed  in 
another  version  of  this  experiment.  Twenty-seven  additional  listeners  were  tested  in  a 
four-session  "screening"  protocol  in  which  they  were  trained  and  tested  in  the 
classification  of  the  three-dimensional  target  and  non-target  sounds.  Approximately  the 
same  number  of  subjects  displayed  a  preference  for  each  of  the  three  dimensions:  Ten 
preferred  spectral  shape  (the  "profile"  in  Green’s  (1983)  terms),  nine  preferred  harmoni- 
city,  and  eight  preferred  amplitude  envelope.  There  were  subjects  who  were  skilled  at 
processing  each  of  the  dimensions  but  could  not  seem  to  simultaneously  process  the 
other  two,  while  a  few  subjects  could  reliably  detect  all  three  dimensions.  The  unsatis¬ 
factory  generalization  from  these  data  was  that  there  were  many  substantially  different 
patterns  of  allocation  of  attention  to  the  three  dimensions  and  very  little  evidence  of 
clusters  of  listeners  with  similar  patterns. 

At  this  point,  this  research  has  yielded  two  primary  results  concerning  listeners’ 
ability  to  categorize  complex  multidimensional  sounds.  One  is  that  listeners  can 
integrate  multidimensional  information  in  sequences  of  sound  pulses,  with  little  or  no 
loss  of  efficiency  with  increasing  sequence  length,  for  sequences  of  one-two-seven  com¬ 
ponents.  The  other  is  that  they  are  often  not  very  good  at  allocating  attention  to  all 
three  features  (or  "dimensions"),  even  though  their  absolute  efficiency  in  this  task  is 
fairly  high.  In  fact,  comparable  levels  of  performance  are  achieved  with  a  variety  of 
patterns  of  attention  to  the  features  of  the  stimuli.  One  possible  interpretation  of  this 
result  is  that  performance  is  limited  in  terms  of  the  amount  of  information  the  listener 
can  extract  from  complex  sounds.  The  existence  of  small  negative  correlations  between 
weightings  for  two  of  the  three  pairs  of  dimensions  gives  some  support  to  this  interpre¬ 
tation. 


New  experiments  have  been  planned  to  determine  (a)  how  well  listeners  can  learn 
to  process  features  they  appeared  to  initially  ignore,  and  (b)  how  well  they  can  be 
taught  to  integrate  information  across  all  three  dimensions.  Initial  attempts  to  train 
listeners  to  attend  to  previously  unattended  dimensions  (utilizing  training  techniques 
that  encourage  listeners  to  attend  to  each  of  the  individual  features  in  the  context  of 
different  stimulus  configurations)  have  had  limited  success.  It  would  seem  very  likely 
that  this  task  can  ultimately  be  learned,  given  that  each  of  the  features  can  be  discrim¬ 
inated  easily. 

5.  Discriminability  of  complex  waveforms  (D.  E.  Robinson  and  S.  M.  Fallon) 

Watson  and  his  colleagues  have  provided  a  large  amount  of  data  concerning  the 
discriminability  of  individual  components  within  sequences  of  tonal  patterns  (Watson, 
Wroton,  Kelly,  and  Benbassat,  1975;  Watson,  Kelly,  and  Wroton,  1976;  Spiegel  and 
Watson,  1981;  Leek  and  Watson,  1984;  Watson  and  Foyle,  1983;  1985a,  1985b;  Watson 
and  Kidd,  1987).  We  are  now  investigating  the  relationship  between  such  relatively 
deterministic  tonal  sequences  and  essentially  random  waveforms.  There  appear  to  be 
some  striking  similarities  between  the  two,  apparently  quite  different,  types  of 
waveforms. 

The  research  described  under  this  heading  is  directed  toward  a  better  understand¬ 
ing  of  the  processes  by  which  listeners  discriminate  between  pairs  of  complex  auditory 
waveforms.  The  waveforms  are,  in  all  cases,  samples  of  broad-band,  white,  Gaussian 
noise,  and  all  experiments  made  use  of  a  same-different  paradigm.  [Portions  of  the  w  ->rk 
described  here  have  been  reported  in  Fallon  and  Robinson  (1985)  and  in  Fallon  and 
Robinson  (1987).] 

5.1.  Models  of  auditory  masking 

Results  from  experiments  in  which  hit  proportions  and  false  alarm  proportions  for 
detecting  a  500-Hz  tone  at  each  of  four  starting  phase  angles  in  each  of  25  reproducible 
noise  samples  were  modeled  by  fitting  the  general  form  of  the  electrical  analog  model  of 
Jeffress  [J.  Acoust.  Soc.  Am.  4&i  480-488  (1967)]  to  the  diotic  data.  The  best-fitting 
configurations  of  this  model  do  not  correspond  to  energy  detectors  or  to  envelope  detec¬ 
tors.  A  detector  composed  of  a  50-Hz-wide  single-tuned  filter,  followed  by  a  half-wave 
rectifier  and  an  integrator  with  an  integration  time  of  100  to  200  ms  fits  the  data  of  all 
four  subjects  relatively  well.  Linear  combinations  of  the  outputs  of  several  detectors 
that  differ  in  center-frequency  or  integration  window  provide  even  better  fits  to  the 
data.  These  linear  combinations  assign  negative  weights  to  some  frequencies  or  to  some 
time  intervals,  suggesting  that  a  subjects’  decision  is  based  on  a  comparison  of  informa¬ 
tion  in  different  spectral  or  temporal  portions  of  the  stimulus.  [Gilkey,  R.H.,&  Robin¬ 
son,  D.E.,  J.  Acoust.  Soc.  Am.,  79,  1499-1510  (1986)] 

5.2.  Effect  of  random  variations  in  level 

If  the  discrimination  between  pairs  of  noise  bursts  is  based  on  a  statistic  such  as 
total  power,  average  power  or  energy,  the  discrimination  should  be  impossible  if  overall 
level  is  randomized  between  the  two  bursts  in  the  same-different  paradigm.  The  effect  of 


such  a  change  was  investigated  at  each  of  two  durations  using  bursts  which  were  either 
identical  ("same"  trials)  or  completely  independent  ("different"  trials).  Within  a  block 
of  trials,  the  noise  bursts  were  either  25-  or  150-msec  in  duration  and  the  level  of  the 
sample  presented  in  one  observation  interval  was  held  constant  while  the  level  of  the 
sample  presented  in  the  other  interval  was  randomly  varied.  In  one  experimental  condi¬ 
tion,  the  level  of  one  of  the  samples  in  the  pair  was  3  dB  greater  than,  3  dB  less  than, 
or  equal  to  the  level  of  the  other  sample.  The  effect  of  a  variation  in  level  of  ±  6  dB  was 
also  examined. 

The  data  indicate  that  varying  the  level  of  one  of  the  samples  in  a  pair  caused  a 
only  slight  decrease  in  discriminability.  When  the  bursts  were  150-msec  in  duration,  the 
average  value  of  d'  without  variations  in  level  was  2.98;  with  a  ±  3  dB  variation,  it  was 
2.46;  and  with  ±  6  dB,  2.09.  For  25-msec  bursts,  the  corresponding  values  of  d'  were 
3.13,  2.79,  and  2.49.  Thus,  although  there  is  a  slight  decrease  in  performance  with  ran¬ 
domized  levels,  the  samples  are  still  quite  discriminable.  We  conclude  that  the  basis  of 
the  discrimination  cannot  be  average  power  or  energy. 

5.3.  Effect  of  temporal  position  of  appended  noise 

Hanna  (1984)  demonstrated  that  samples  of  wide-band  reproducible  noise  are 
highly  discriminable  over  a  large  range  of  durations.  We  have  found  that  discriminabil¬ 
ity  can  be  reduced  by  increasing  the  similarity  between  the  pairs  of  samples  to  be 
discriminated.  During  "different"  trials  of  the  same-different  procedure,  the  second  sam¬ 
ple  of  the  pair  was  generated  by  repeating  a  temporal  segment  of  the  sai  iple  presented 
in  the  first  interval  and  combining  it  with  a  new  sample  of  noise.  The  total  duration  of 
the  second  sample  of  a  pair  is  equal  to  the  duration  of  the  new  sample  plus  the  dura¬ 
tion  of  the  repeated  sample  of  noise.  The  total  duration,  as  well  as  the  duration  and 
position  of  the  new  segment  of  noise  was  varied.  The  three  total  durations  examined 
were:  150,  50,  and  25  msec.  The  new  segment  of  noise  was  either  placed  at  the  begin¬ 
ning,  in  the  middle,  or  at  the  end  of  the  repeated  sample  of  noise. 

The  degree  of  similarity  between  the  two  samples  presented  during  a  "different" 
trial  may  be  expressed  in  terms  of  the  inter-pair  correlation  (r):  the  duration  of  the 
repeated  sample  of  noise  divided  by  the  total  duration  of  the  sample.  When  the  data 
are  expressed  in  terms  of  correlation,  the  threshold  value  of  r  is  independent  of  dura¬ 
tion,  but  is  highly  dependent  upon  the  position  of  the  appended  segment.  Although 
discriminability  was  not  affected  by  the  total  duration  of  the  sample,  the  temporal  posi¬ 
tion  of  the  new  segment  had  a  large  and  consistent  effect:  segments  placed  at  the  end 
were  more  discriminable  than  those  in  the  middle  which  were  more  discriminable  than 
those  at  the  beginning. 

The  effect  of  temporal  position  on  discriminability  also  occurs  with  tonal 
sequences.  Watson  and  his  colleagues  (Watson  et.  al.,  1975,  1976)  showed  that  discrimi¬ 
nability  increases  as  the  location  of  the  test  tone  is  moved  from  the  beginning  to  the 
end  of  a  450  msec  tonal  pattern.  Hanna  (1984)  also  determined  that  the  discriminabil¬ 
ity  of  two  samples  of  reproducible  noise  was  dependent  on  the  temporal  positions  of  the 
repeated  and  appended  segments.  Hanna’s  data  indicate  that  discriminability  is  best  in 
the  end  condition,  decreases  in  the  beginning  condition,  and  is  worst  in  the  middle 
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condition.  Based  on  the  results  of  the  present  experiment  as  well  as  on  the  research  of 
Watson  and  his  colleagues,  one  would  have  predicted  the  middle  condition  to  be  more 
discriminable  than  the  beginning  condition.  The  discrepancy  may  be  attributable  to 
procedural  differences  such  as  the  duration  of  the  samples  of  noise  or  the  degree  of 
stimulus  uncertainty. 

The  results  of  this  experiment  and  the  data  of  Watson’s  group  indicate  that  the 
processes  underlying  the  discriminability  of  sequences  of  tonal  patterns  and  the  discrim- 
inability  of  samples  of  reproducible  noise  are  very  similar.  The  just-detectable  seg¬ 
ments  of  "different”  noise  in  these  experiments  tend  to  be  a  constant  proportion  of  the 
total  stimulus  duration.  This  result  is  very  similar  to  the  performance  described  for 
various  duration  tonal  patterns,  in  the  "capacity"  experiments  discussed  by  Watson  and 
Foyle  (1985a)  and  by  Watson  and  Kidd  (1987).  The  fact  that  two  distinctly  different 
types  of  complex  waveforms  appear  to  be  processed  in  the  same  manner  suggests  that 
discriminability  is  dependent  on  the  more  global  characteristics  of  the  complex 
waveform  rather  than  on  the  fine  structure  of  a  specific  waveform. 

5.4.  Effect  of  decorrelation:  autocorrelation 

This  experiment  investigated  the  discriminability  of  noise-samples  which  differed  in 
their  autocorrelation.  As  in  the  previous  experiment,  "same"  trials  were  generated  by 
repeating  in  the  second  interval  the  sample  presented  in  the  first  interval.  On  "different" 
trials,  however,  the  sample  presented  in  the  second  interval  was  generated  by  deleting 
the  first  T-msec  of  the  sample  from  the  first  interval  and  appending  T-msec  of  indepen¬ 
dent  noise  to  the  end.  In  the  experiment  described  in  Sec.  6.2,  new  noise  was  appended 
at  the  beginning,  middle,  or  end.  The  data  from  the  "end"  condition  of  that  experiment 
are  very  similar  to  those  from  the  present  experiment.  The  two  conditions  are  similar  in 
that  in  each,  independent  noise  is  appended  at  the  end  of  the  150-msec  burst.  The  two 
conditions  differ,  in  that,  for  the  ’end’  condition,  samples  in  the  two  intervals  are  ident¬ 
ical  for  the  duration  T  ,  while  for  the  autocorrelation  experiment,  the  beginning  seg¬ 
ments  differ.  Since,  as  was  pointed  in  Sec.  6.2,  differences  between  samples  which  occur 
at  the  beginning  or  in  the  middle  have  only  a  small  effect  on  discriminability,  it  is  not 
surprising  that  the  ’end’  and  the  autocorrelation  conditions  are  similar. 

5.5.  Effect  of  decorrelation:  added  noise 

The  correlation  between  pairs  of  noise  samples  may  also  be  reduced  by  reducing 
the  proportion  of  variance  common  to  the  two  samples.  In  this  experiment,  "same"  tri¬ 
als  were  generated  by  presenting  identical  samples  of  noise  in  both  observation  inter¬ 
vals.  "Different"  trials  were  generated  by  adding  a  new,  independent,  sample  of  noise  to 
the  sample  which  had  been  presented  during  the  first  observation  interval.  The  relative 
levels  of  these  two  samples  determined  the  Pearson  product-moment  correlation 
coefficient  between  the  samples  presented  in  the  two  intervals.  The  overall  level  of  the 
samples  in  the  two  intervals  was  maintained  at  50  dB  SPL/Hz. 

For  all  four  of  the  durations  examined,  discriminability  decreased  as  the  correla¬ 
tion  increased.  The  decrease  was  slight  for  correlations  between  0.00  and  0.75,  and  very 
rapid  for  correlations  greater  than  0.75.  This  is  to  say  that  two  samples  are  easily 
discriminable  when  they  have  less  than  about  50%  common  variance. 
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5.0.  Effect  of  the  temporal  position  of  a  decorrelated  segment 

In  this  experiment  "different"  trials  were  generated  by  decollating,  as  in  Sec.  6.4, 
only  a  portion  of  the  150-msec  waveform  presented  in  the  second  interval.  The  decorre¬ 
lated  portion  was  located  at  either  the  beginning,  the  middle,  or  the  end  of  the 
waveform.  As  expected  from  previous  experiments,  discriminability  is  highly  dependent 
on  the  temporal  position  of  the  decorrelated  segment.  When  the  correlation  was  0.00, 
threshold  durations  were  approximately  25-,  60-,  and  90-msec  for  segments  at  the  end, 
middle  and  beginning.  When  the  correlation  was  0.75,  these  values  had  increased  to 
approximately  50-,  90-,  and  120-msec.  The  large  effect  of  temporal  position  which  we 
reported  previously  is  still  maintained  as  correlation  is  increased. 

5.7.  Effect  of  gap  duration  and  position 

In  this  experiment,  the  overall  duration  of  the  bursts  of  noise  in  a  pair  was  150 
msec  and  either  a  25  msec  segment  of  new  noise  was  appended  to  the  end  of  the  burst 
or  a  50  msec  sample  was  appended  to  the  beginning  of  the  burst.  A  silent  interval  or 
gap  replaced  a  portion  of  the  repeated  segment  either  immediately  following  or  immedi¬ 
ately  prior  to  the  appended  segment.  The  duration  of  the  gap  was  gradually  increased 
until  only  5  msec  of  the  repeated  segment  remained.  Although  discriminability 
increased  as  gap  duration  increased  the  presence  of  a  brief  repeated  segment  temporally 
separated  from  the  appended  segment  by  90-120  msec  caused  a  large  decrement  in  per¬ 
formance.  For  example,  when  each  burst  in  a  pair  consisted  of  a  5  msec  repeated  seg¬ 
ment  followed  by  a  120  msec  gap  and  a  25  msec  appended  segment,  the  average  P(C)  is 
0.72.  If  the  5  msec  repeated  segment  was  not  present  and  the  pair  of  25  msec  bursts 
was  presented  in  isolation  the  overall  P(C)  increased  to  0.88.  It  would  appear,  then, 
that  interactions  occuring  after  such  a  long  silent  interval  are  unlikely  to  be  to  peri¬ 
pheral  sensory  interactions,  as  was  suggested  by  Hanna  (1984). 

6.  Information  integration:  multiple  observations  and  internal  noise  (D.  E.  Robin¬ 
son  and  B.  G.  Berg) 

The  work  described  in  this  section  began  with  two  major  goals.  The  first  is  to 
understand  the  processes  by  which  humans  integrate  information  over  time  or  over 
channels.  The  "multiple  look"  problem  is  the  basis  for  our  initial  work  in  this  area.  The 
basic  question  is,  "How  much  additional  information  is  gained  by  allowing  observers 
more  than  one  observation  in  a  detection  or  discrimination  task?"  The  second  goal  is  to 
develop  and  evaluate  models  of  "internal  noise."  The  amount  and  rate  of  improvement 
in  performance  with  an  increasing  number  of  observations  will  depend  not  only  upon 
the  amount  of  internal  noise,  but  upon  the  level  of  processing  at  which  the  internal 
noise  is  added.  The  following  section  describes  our  attempts  to  describe  the  improve¬ 
ments  that  occur  with  multiple  observations  and  to  model  the  processes  that  lead  to 
such  improvements.  [Portions  of  the  work  described  here  are  reported  in  Robinson  and 
Berg  (1986),  in  Berg  and  Robinson  (1987),  Berg  (1987),  and  Sorkin,  Robinson,  and  Berg 
(1987). J 
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6.1.  Internal  noise  model 

Previous  research  has  demonstrated  that  performance  in  signal-in-noise  detection 
tasks  improves  as  listeners  are  allowed  more  observations  (Swets,  et  al.,  1959;  Swets 
and  Birdsall,  1978).  According  to  signal  detection  theory,  the  rate  of  improvement  is  a 
function  of  the  square-root  of  the  number  of  observations: 


where: 


d'n=  (m2  *  ml)  /  (vext/n  +  vint/n)*  =  (n)*d'  1  M 

d'  ,  d'  after  n-observations 
n 

d'  j  ,  d'  for  one  observation 

m^  ,  the  mean  of  the  noise-alone  distribution 

m2  ,  the  mean  of  the  signal-plus-noise  distribution 

vext  ’  common  variance  of  the  N  and  SN  distributions 

v.  .  ,  the  variance  of  the  internal  noise. 


Internal  noise  is  assumed  to  be  added  prior  to  the  formulation  of  a  decision  statis¬ 
tic.  This  derivation  of  the  square-root-of-n  rule  assumes  that  the  decision  statistic  is 
the  mean  of  the  n  likelihood  ratios  (or  any  monotonic  transformation  of  the  likelihood 
ratios)  obtained  from  the  n  observations.  Previous  research  has  supported  this  square- 
root-of-n  prediction.  However,  the  earlier  work  provided  only  a  limited  test  of  the 
model,  since  n  never  exceeded  six.  Our  research  has  extended  this  work  by  using  several 
paradigms  and  a  greater  number  of  observations. 

6.2.  Sequential  presentation  of  n  tones 

Consider  the  following  task.  There  are  two  probability  density  functions  on  fre¬ 
quency,  one  with  a  mean  of  1000  Hz,  one  with  a  mean  of  1100  Hz,  and  both  with  a 
common  standard  deviation  of  100  Hz.  On  each  trial,  n  independent  samples  are 
selected  from  one  of  the  distributions  and  presented  sequentially  over  headphones  as  n, 
50-msec  tone  bursts,  separated  by  50  msec  silent  gaps.  The  listener’s  task  is  to  decide 
from  which  of  the  two  distributions  the  n  tones  were  sampled.  Our  results  indicate  that 
listeners  can  approach  the  theoretical  d'  for  n=l,  but  do  not  follow  the  square- root-of- 
n  rule,  even  for  small  n.  Representative  data  for  one  subject  are  shown  in  Figure  6.2a. 
The  solid  line  represents  the  predictions  of  Equation  1. 

One  interpretation  of  the  model,  described  by  Equation  1,  is  that  some  amount  of 
internal  noise  is  added  to  each  observation  prior  to  generating  a  decision  statistic. 

Once  the  decision  statistic  is  obtained,  no  additional  variance  is  assumed.  This  model 
allows  no  parsimonious  account  of  variance  introduced  by  uncertainty  of  the  decision 
criterion,  changes  in  response  bias,  or  memorial  factors  associated  with  the  decision 
statistic.  The  model  can  be  extended  by  allowing  additional  variance  after  the  genera¬ 
tion  of  the  decision  statistic.  This  "partitioned  variance"  model  is  represented  by  the 
equation: 


.r  f  m  Td! •"»  I 


(2) 


d'  n  =  (m2  -  ml)  /  (vext/a  +  vp/n  +  v/ 


where: 


v  ,  the  variance  of  the  peripheral  noise 

vP,  the  variance  of  the  central  noise, 
c  ’ 


In  this  model,  internal  noise  is  added  at  two  stages:  (1)  at  the  periphery,  before  a 
decision  statistic  is  formed  and  (2)  centrally,  after  the  statistic  is  formed.  The  dashed 
line  in  Figure  6.2a  represents  the  function  obtained  for  subject  KN  by  a  least-squares 
estimate  of  the  two  parameters.  Similar  fits  were  obtained  for  the  three  other  subjects 


Figure  6.2a  6.2b 

There  is  a  second  method  of  estimating  the  two  parameters.  Consider  the  function 
relating  the  probability  of  reporting  ’lower  distribution"  to  the  mean  frequency  of  the 
sample.  An  ideal  observer  would  generate  a  step  function;  when  the  mean  frequency 
was  less  than  the  criterion,  the  ideal  would  report  "lower",  and  would  report  "higher" 
when  the  mean  exceeded  the  criterion.  Within  the  model,  any  deviation  from  this  step 
function  can  be  attributed  to  internal  noise.  The  variance  of  this  internal  noise  can  be 
estimated  by  fitting  a  normal  ogive  to  the  obtained  data.  Figure  6.2b  shows  the  best 
fitting  functions  for  subject  KN  for  sample  sizes  of  1,  4,  and  12. 

The  slope  of  the  functions  increase  with  increasing  n,  indicating  that  the  total 
internal  variance  is  decreasing.  In  this  manner,  estimates  of  the  total  internal  variance 
were  obtained  for  each  sample  size  (n=l,2,3,4,6,8,10  and  12).  Estimates  of  the  peri¬ 
pheral  and  central  variance  were  obtained  by  a  least-squares  fit  to  the  equation: 
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A  comparison  of  the  parameter  estimates  obtained  with  Equations  2  and  3  showed 
remarkably  good  agreement  for  all  four  subjects.  The  partitioned  variance  model  thus 
provides  a  reasonably  good  account  of  the  data,  and  represents  an  improvement  in  the 
formal  treatment  of  internal  variance. 


0.3.  Simultaneous  presentation  of  n  tones 

This  work  was  conducted  in  collaboration  with  Dr.  Wesley  Grantham  of  the  Bill 
Wilkerson  Hearing  and  Speech  Center,  of  Vanderbilt  University.  Grantham  conducted 
an  experiment  similar  to  that  described  in  Sec.  7.1,  with  the  exception  that  the  n  tones 
were  added  and  presented  simultaneously,  rather  than  sequentially.  Data  could  be  rea¬ 
sonably  described  by  Equation  1.  That  is,  little  or  no  central  variance  was  required. 
Preliminary  conclusions  seemed  to  indicate  a  fundamental  difference  between  the  pro¬ 
cessing  of  information  presented  sequentially  and  information  presented  simultaneously. 
However,  this  difference  can  be  attributed  to  a  procedural  difference  between  the  two 
studies.  For  technical  reasons,  tones  were  sampled  without  replacement  for  simultane¬ 
ous  presentation,  whereas  sampling  was  done  with  replacement  for  sequential  presenta¬ 
tions.  Equation  1  assumes  independent  sampling  and  is  valid  for  the  sequential  tones 
study,  but  a  correction  factor  is  required  to  obtain  predictions  when  sampling  is  done 
without  replacement.  Obtaining  fits  to  Grantham’s  data  using  this  correction  factor 
indicated  a  less  than  optimal  growth  rate  in  d'  for  all  three  subjects,  and  required  the 
addition  of  central  variance.  A  comparison  of  estimates  of  peripheral  and  central  vari¬ 
ance  across  the  two  studies  showed  relatively  good  agreement. 


6.4.  Distribution  of  internal  noise  over  the  tonal  sequence 

An  important  question  raised  by  our  general  model  is  whether  information  from 
each  tone  in  a  tonal  sequence  is  equally  weighted  in  determining  an  observer’s  decision. 
We  have  developed  a  technique  for  assing  how  internal  noise  is  distributed  over  the  N- 
elements  of  a  tonal  sequence.  In  terms  of  the  model  as  described  in  Eq.  2,  the  amount 
of  information  obtained  from  different  tones  in  an  N-element  sequence  will  be  reflected 
in  the  variance  of  the  internal  noise  added  at  each  temporal  position.  If  a  particular 
temporal  position  contributes  little  to  the  final  decision,  that  position  will  be  found  to 
have  a  large  amount  of  internal  noise  associated  with  it.  If,  on  the  other  hand,  a  par¬ 
ticular  element  contributes  a  great  deal,  that  element  will  have  less  internal  noise  asso¬ 
ciated  with  it.  Data  from  an  auditory  experiment  were  analyzed  to  assess  how  internal 
noise  is  distributed  over  successive  temporal  positions.  Over  many  thousands  cf  trials 
we  store  the  frequency  of  the  tones  actually  presented  in  the  ith  temporal  position  (i  = 
1,  2,  ...  n;  where  n  is  the  number  of  tones  in  the  sequence).  We  then  partition  these 
stored  frequencies  into  bins  of  arbitrary  width.  The  purpose  of  our  <*r»alysis  is  to  keep 
track  of  the  number  of  trial  events  on  which  the  frequency  of  the  ith  element  was  in 
each  frequency  bin.  For  each  bin  and  each  temporal  position,  we  then  compute  the 
probability  that  the  subject  responds  that  the  sequence  came  from  the  lower  distribu¬ 
tion.  Cumulative  normal  distributions  are  then  fit  to  the  resulting  ogives.  The  stan¬ 
dard  deviation  of  the  best  fitting  normal  distribution  is  then  an  estimate  of  the 


VWAW.V 


■.v.vv 


1 


I 


1 


r  (kHUlitOtiiw'iiftiCkrii I’Ui.i1  u'LiHIrlJljlLrixl  «iVi 


standard  deviation  of  the  total  internal  noise  limiting  performance  at  each  display  posi¬ 
tion. 

Figure  6.4  shows  the  standard  deviation  of  the  internal  noise  as  a  function  of  tem¬ 
poral  position.  The  parameter  on  the  figure  is  the  total  sequence  length,  n.  If  each  ele¬ 
ment  in  the  sequence  contributed  equally  to  the  final  decision,  the  lines  in  Figure  6.4 
would  be  horizontal.  It  is  clear  that  the  last  tone  in  a  sequence  contributes  more  to  the 
final  decision  than  do  tones  in  the  middle,  which  contribute  less  than  those  near  the 
beginning  of  the  sequence. 
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Figure  6.4 


8.5.  Additive  or  multiplicative  internal  noise? 

A  common  assumption  of  the  models  discussed  above  is  that  external  and  internal 
variance  are  independent  and  additive.  This  assumption  was  tested  by  using  a  within- 
subjects  factorial  design  consisting  of  two  levels  of  external  variance 

(vext)^  ==  100  Hz  or  150  Hz 

and  two  levels  of  the  mean  frequency  difference  between  the  two  distributions. 
m2  *  ml  =  Hz  or  Hz. 

For  each  of  the  four  conditions,  the  experimental  procedure  was  identical  to  the 
sequential  tone  paradigm  described  previously.  Data  obtained  from  four  listeners  indi¬ 
cate  that  estimates  of  internal  variance  are  not  affected  by  changes  in  the  mean  fre¬ 
quency  difference  for  a  fixed  level  of  v  However,  estimates  of  internal  variance 
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increase  when  vgx^  is  increased.  This  increment  in  internal  variance  is  obtained  for  both 
levels  of  the  mean  frequency  difference.  These  data  violate  the  assumption  of  additivity, 
and  suggest  that  internal  variance  increases  as  a  function  of  the  external  variance. 


7.  Facility  Development 

During  the  present  grant  period  the  experimental  facilities  in  the  Hearing  and 
Communication  Laboratory  have  been  substantially  improved.  Originally  HCL  had  a 
single  11/23  computer  which  was  (and  still  is)  heavily  committed  to  running  on-line 
experiments.  A  PDP  11/83  was  installed  to  assist  with  off-line  support  of  laboratory 
activities  such  as  stimulus  generation,  data  analysis,  program  development  and  signal 
processing.  A  Digisound-16  D/A,  A/D  converter  was  installed  and  software  was 
developed,  so  that  synthesis  of  the  sounds  for  the  vigilance  experiments  could  b^  done 
on  the  11/83. 

In  addition,  the  grant  provided  savings  that,  together  with  other  funds,  allowed  us 
to  purchase  two  Apollo  workstations.  These  workstations  are  each  powerful  mini¬ 
computers  which  run  UNIX.  We  participated  in  establishing  Apollo  Domain  ring  at 
Indiana  together  with  an  interdisciplinary  group  of  investigators  in  Computer  Sciences, 
Linguistics  and  Mathematics,  all  of  whom  had  interests  in  speech  and  auditory  process¬ 
ing.  We  have  been  able  to  share  software  applicable  to  the  needs  of  this  group,  in  par¬ 
ticular  statistical  and  signal  processing  packages,  digitizing  facilities,  and  speech  recog¬ 
nition  tools  and  algorithms.  Several  new  workstations  have  now  been  added  to  this 
network  by  the  AFOSR  supported  Institute  for  the  Study  of  Human  Capabilities. 
Interests  between  our  labs  and  the  Institute  overlap  considerably,  and  the  communica¬ 
tion  with  these  additional  investigators  has  further  enhanced  the  usefulness  of  the 
Apollo  system.  We  are  fortunate  to  have  been  able  to  build  a  state-of-the-art 
psychoacoustic  and  speech  laboratory  over  the  past  few  years.  That  system  now 
enables  us  to  conduct  a  wide  range  of  extensions  of  the  research  supported  here, 
without  need  for  additional  apparatus,  at  least  in  the  near  future. 
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